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Abstract: Heilongjiang province is the largest forest zone in China and 
the forest coverage rate is 46%. Forests of Heilongjiang province play an 
important role in the forest ecosystem of China. In this study we investi¬ 
gated the spatial distribution of forest carbon storage in Heilongjiang 
province using 3083 plots sampled in 2010. We attempted to fit two 
global models, ordinary least squares model (OLS) , linear mixed model 
(LMM), and a local model, geographically weighted regression model 
(GWR), to the relationship between forest carbon content and stand, 
environment, and climate factors. Five predictors significantly affected 
forest carbon storage and spatial distribution, viz. average diameter of 
stand (DBH), number of trees per hectare (TPH), elevation (Elev), slope 
(Slope) and the product of precipitation and temperature (RainTemp). 
The GWR model outperformed the two global models in both model 
fitting and prediction because it successfully reduced both spatial auto¬ 
correlation and heterogeneity in model residuals. More importantly, the 
GWR model provided localized model coefficients for each location in 
the study area, which allowed us to evaluate the influences of local stand 
conditions and topographic features on tree and stand growth, and forest 
carbon stock. It also helped us to better understand the impacts of silvi¬ 
cultural and management activities on the amount and changes of forest 
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carbon storage across the province. The detailed information can be 
readily incorporated with the mapping ability of GIS software to provide 
excellent tools for assessing the distribution and dynamics of the for¬ 
est-carbon stock in the next few years. 

Keywords: carbon content, biomass, global and local models, GWR 
model 

Introduction 

The global carbon cycle is a natural process involving matter and 
energy, and occurring in the earth biosphere and atmosphere 
(Falkowski et al. 2000; Fang et al. 2001), in which land vegeta¬ 
tion plays a key role (Granier et al. 2000; Law et al. 2001; 
Hazlett et al. 2005; Smith et al. 2006). In modern times, human 
activities caused increases in concentrations of atmospheric C0 2 , 
a greenhouse gas, from the burning of fossil fuels and this led to 
global warning. Development of technologies to control gas 
emissions to reduce the concentration of C0 2 has become a focal 
point for international research in many study areas (Houoatow 
1997; Neilson et al. 2007). Forest ecosystems are considered the 
main carbon sinks in the earth biosphere. Carbon sequestration 
by forests is a process of absorbing, collecting and, storing C0 2 
through photosynthesis (Kurz and Apps 1999; Heath and Smith 
2000; Phillips et al. 2000; Sohngen and Mendelsohn 2003; Baral 
and Guha 2004). The carbon storage of a forest is usually con¬ 
sidered to consist of several pools (Fahey et al. 2009). In this 
paper, forest carbon storage means carbon in the living trees, 
including stems, branches, foliage, and roots. To date, most re¬ 
search on forest carbon storage has been focused on forest bio¬ 
mass and carbon storage, but few studies have examined 
geo-spatial distribution of forest carbon storage, which is also an 
important factor affecting area-temperature, carbon emission, 
and the benefits of carbon sinks. For these reasons geo-spatial 
distribution of forest carbon storage has received increasing em¬ 
phasis in recent years. 

It is well known that the distribution of forest carbon storage is 
heterogeneous across geographical regions. Forest management 
regimes also have major impacts on forest carbon storage and 
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intensity. It is crucial and necessary to obtain accurate and timely 
information on the changes of forest carbon storage within and 
between regions for management planning and policy making. 
However, traditional methods such as geostatistics, conventional 
regression, and Kriging, using readily accessible remote sensing 
data (Myeong et al. 2006), are inadequate to provide accurate 
estimation and prediction on the spatial distribution of forest 
carbon storage because of the low correlation between the reflec¬ 
tive spectrum property of remote sensing and forest carbon stock 
(Sales et al. 2007; Du et al. 2010). More advanced spatial mod¬ 
eling methods have been developed and applied to improve and 
enhance the description and prediction of the spatial distributions 
of forest carbon storage, such as spatial lag model (SLM), spatial 
error model (SEM), spatially adaptive filtering model, and linear 
mixed model (LMM) (Anselin and Bera 1998; Zhang et al. 2009). 
These modeling techniques incorporate the spatial autocorrela¬ 
tions in the data into modeling processes to obtain better model 
parameters, and more importantly, unbiased estimation for the 
standard errors of model coefficients that improve statistical 
testing. These modelling methods have been applied in econom¬ 
ics (Chi and Qian 2009), population studies (Chi 2010), forestry 
(Zhang et al. 2009; Lu and Zhang 2011) and other fields. Al¬ 
though capable of reducing spatial autocorrelation for residual, 
these models are global in scope and applicable to very large 
study areas. They are therefore less efficient at assessing spatial 
heterogeneity in forest carbon storage (Zhang and Gove 2005; 
Zhang et al. 2005; 2009). 

Data on forest carbon storage are typically collected over large 
geographic regions. The differences between locations can be 
significant in terms of topographic features, species composition 
and richness, vegetation cover, and, consequently, forest biomass 
and carbon storage (i.e., spatial heterogeneity can be high). Each 
location is similar to adjacent neighboring areas, but dissimilar to 
distant areas. In recent years, localized summary statistics and 
modeling methods have been used to explore the effect of spatial 
heterogeneity in many studies (Pickett and Cadenasso 1995; 
Espa et al. 2013). In general, for any targeted location the de¬ 
scriptive statistics of response variables of interest are computed 
by weighting the neighboring locations within a user-defined 
bandwidth by a distance-decay function from the target location. 
The neighboring locations adjacent to the targeted location have 
more influence on the resulting descriptive statistics than do 
more distant locations. The localized descriptive statistics are 
therefore available for each location throughout the entire study 
area and this provides a direct way of assessing and visualizing 
the heterogeneity and scale dependence of ecological patterns 
across a geographic area (Brunsdon et al. 2002; Ma et al. 2011). 

Geographically weighted regression (GWR) has become a 
popular approach to depicting spatial heterogeneity in a regres¬ 
sion context (Brunsdon et al. 1996; Brunsdon et al. 1999; Foth- 
eringham et al. 2001; Fotheringham et al. 2003; Charlton et al. 
2009). GWR captures spatial variation by calibrating a regres¬ 
sion model fitted at each location in the study area and weighting 
all neighboring locations by a distance-decay function from the 
targeted location. Thus, GWR provides a set of localized model 
parameters of the regression model at each location so that local 
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spatial variation is fully incorporated into the modeling processes. 
With the rapid development of visualization tools of Geographic 
Information System (GIS), the model parameters, standard errors 
of the parameters, and model R 2 associated with each location 
can be mapped to illustrate spatial heterogeneity in the regression 
relationship under study. Consequently, the influence of mi¬ 
cro-environment features, vegetation competition, growth poten¬ 
tial, and the impacts of management activities on forest biomass 
and carbon can be evaluated, tested, modeled, and readily visu¬ 
alized (Zhang and Shi 2004; Zhang et al. 2004; Zhen et al. 2013). 

In this study, we attempted to depict the spatial distribution of 
forest carbon storage in Heilongjiang Province, P.R. China. The 
objectives of this study were (1) to fit the ordinary least squares 
(OLS) model, linear mixed model (LMM) and geographically 
weighted regression (GWR) model to the data of forest carbon 
storage; (2) to compare the fit and performance of the three 
models; (3) to assess the spatial autocorrelations and heterogene¬ 
ity of the model residuals; and (4) to validate the model predic¬ 
tions using an independent data set. 

Materials and methods 

Study Area 

Heilongjiang Province (121°11' E to 135°05' E and 43°25' N to 
53°33' N) is located in northeast China (Fig. 1) with a total land 
area of 454,000 km 2 . The terrain of the province slopes down¬ 
ward from the northwest to the southeast and includes four 
prominent mountain ranges, viz. Da Xing’an Mountain (Greater 
Xing’an Range), Xiao Xing’an Mountain (Lesser Xing’an 
Range), Zhangguangcai Range, and the Wanda Range. About 
36% of the province has elevation higher than 300 m. The San- 
jiang Plain in the northeast and the Songnen Plain in the west are 
important parts of the largest plain in China, covering 37% of the 
province at elevations ranging from 50 to 200 m. The study area 
has a northern temperate continental monsoon climate. The an¬ 
nual average temperatures are -5.0 °C-5.0 °C. The annual aver¬ 
age precipitation is 400-670 mm and 75% of the precipitation 
occurs between June and August. 

The province has a total of forest area over 20.8 million hec¬ 
tares (about 45.8% of the total land area of the province) and the 
forest coverage rate is 45.7%. Total volume of living trees was 
1.76 billion m 3 and the average volume per hectare was 78.6 m 3 . 
For timber forests, the area of young and mid-aged stands 
reached to 82.2%, while the mature and over-mature forests ac¬ 
counted for only 6.4% of the area (HLJFOREST. 2014). Major tree 
species include Korean pine ( Pinus koraiensis Sieb.et Zucc.), 
spruce ( Picea koraiensis Nakaki and Picea jezoensis (Sieb.et 
Zucc.) Carr.), fir ( Abies nephrolepis (Trautv.) Maxim.), larch 
(Larix gmelinii (Rupr.) Rupr. and Larix olgensis Henry), Mon¬ 
golian pine ( Pinus sylvestris .van mongolica Litv.), basswood 
(Tilia amurensis Rupr. and Tilia mandshurica Rupr.et Maxim.), 
Mongolian oak ( Quercus mongolica Fisch. ex Ledeb.), birch 
(Betula platyphylla Suk., Betula costata Trautv. and Betula da- 
vurica Pall.), maple ( Acer mono Maxim.), Manchurian ash 
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(Fraxinus mandshurica Rupr.), Manchurian walnut ( Juglas 
mandshurica Maxim.), Amur cork tree ( Phellodendron amu- 
rense Rupr.), poplar (. Populus davidiana Dode., Populus us- 
suriensis Korn.), elm ( Ulmus propinqua Koidzand Ulmus lacini- 


ata (Trautv.) Mayr.), and other minor species. Korean pine is the 
dominant species, spruce and fir dominate in high elevation areas 
(Zhou et al. 2013). 



Heilongjiang Province 


Legend 

North of Xiao Xing'an Mountain 
South of Xiao Xing'an Mountain 
West of Zhangguangcai Mountain 
East of Zhangguangcai Mountain 

Wanda Mountain 
Central plains 



Fig. 1: Map of the study area in Heilongjiang, China, and the geographical location of the six regions with the number of sample plots. 


Biomass data and climatic variables 

The study area was divided into six regions: (1) the northern 
region of the Xiao Xing’an Mountain, (2) the southern region of 
the Xiao Xing’an Mountain, (3) the western region of the 
Zhangguangcai Mountain, (4) the eastern region of the Zhang¬ 
guangcai Mountain, (5) Wanda Mountain, and (6) Central Plains. 
The stand and tree data used in this study were collected from 
these six regions in 2010 from 3038 permanent sample plots 
(each 0.06 ha in size) monitored by the Chinese National Forest 
Inventory (CNFI) and Key Ecological Benefit Forest in Heilong¬ 
jiang province. The geographic location and topographic de¬ 
scriptors were recorded for each plot. Stand and tree variables 
were measured and recorded to describe elevation (m), slope (°), 
slope aspect, number of tree species, number of trees by species, 
diameter at breast height (DBH) (cm), and individual height of 
all living trees (HT) (m). Other stand and tree variables were 
then computed on a per hectare basis including number of trees 
per hectare (TPH) (trees/ha), volume of living trees (m 3 /ha), 
average age of living trees (year), vegetation coverage (%), and 
others. The entire data set for 3038 plots was randomly split into 
model fitting data (n =2379 plots) and model testing data (m = 


659 plots). 

The carbon content of individual trees includes four compo¬ 
nents - stem, branch, foliage, and root. In a sample plot, the bio¬ 
mass models (unpublished) based on the Tang et al. method 
(2001) were used to compute the biomass of the four components 
(i.e., stem, branch, foliage, and root) for each species based on 
tree DBH. The carbon content of each component was then ob¬ 
tained by multiplying the component biomass by the corre¬ 
sponding component carbon content rate, which was tested with 
a C/N analyzer using the collected samples of each tree compo¬ 
nent. Thus, the individual tree carbon content was obtained by 
adding up the carbon contents of the four tree components. The 
sum of all individual trees in the same plot was the total carbon 
content (ton) for each sample plot, and the carbon content per 
hectare (t/ha) was calculated as the total carbon content of the 
sample plot divided by the plot area. 

One-year meteorological records were collected from the 59 
weather stations in the provinces of Heilongjiang, Jilin, and Inner 
Mongolia in 2010. In this paper, temperature and precipitation 
data were considered to be the main factors, as they influenced 
tree growth and, thereby, carbon sequestration of the forest. 
Kriging interpolation was used to obtain the temperature and 
precipitation data from these weather stations for each sample 
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plot in this study at interpolation precision >85%. 

The purpose of this study was to describe the spatial distribu¬ 
tion and pattern of forest carbon storage via regression models. 
The dependent variable was carbon content (C) for each plot (ton 
/ ha). A number of stand and tree variables were tested and five 
were selected as the predictors in the models: average diameter 
of living trees (DBH), number of trees per hectare (TPH), eleva¬ 
tion (Elev), slope (Slope), and the product of precipitation and 
temperature (RainTemp). DBH represented the size and/or age 
of the stand. TPH represented the density of the stand. Elev and 
Slope represented the geographic characteristics of the stand. 
The Rain Temp represented the climatic features of the stand. 

The basic statistics of the variables are listed in Table 1. 


Table 1: Descriptive statistics of variables used in this study 


Variables 

Number of 

Plots 

Mean 

Std 

Minimum 

Maximum 

C (ton / ha) 

3038 

34.4 

20.8 

0.5 

99.9 

DBH (cm) 

3038 

12.6 

3.6 

5.2 

28.3 

TPH (/ha) 

3038 

1272 

699 

83 

3717 

Elev (m) 

3038 

275 

114 

0 

740 

Slope (°) 

3038 

8 

6.6 

0 

45 

Rain Temp 

3038 

1322.7 

779.5 

-66.6 

2572.1 


Regression models 

In this study we used three regression models - ordinary least 
squares model, linear mixed model, and geographically weighted 
regression model. We briefly describe the modeling techniques 
below. 

(1) Ordinary Least Squares (OLS). Suppose we have a set of n 
observations on a dependent or response variable Y, and p inde¬ 
pendent or predictor variables X. The relationship between Y and 
X can be regressed using OLS as follows: 


y is a vector of unknown random-effects parameters, and s is the 
random error term. It is assumed: (1) E(y) = 0 and Var (y) = G is 
a covariance matrix of random-effects, (2) E(s) = 0 and Var (s) = 
R is a covariance matrix of model residuals, (3) Cov (y, s) = 0, 
and (4) both y and s are normally distributed, and (4) the vari¬ 
ance of Y is given by V = ZGZ ' + R and can be estimated 
by setting up the random-effects design matrix Z and specifying 
the covariance structures for G and R. In this case, OLS is no 
longer the best parameter estimation method, while maximum 
likelihood methods are usually used to obtain (3 and y (Little et al. 
2006). 

(3) Geographically Weighted Regression (GWR). To investi¬ 
gate the spatial variation of a regression relationship, the data 
must be collected with the location coordinates (w i? vj) for each 
observation i, i = 1,2, .. .,j, ..., n.j is neighboring observation of 
L The local information leads to estimate the localized regression 
coefficients of the relationship of interest (Fotheringham et al. 
2002). The underlying model for GWR is expressed as below: 

Y i = fio (v > v ,) + Z Pk ( u i ’ v i ) X ki + £ i (4) 

k =1 

where 7, is the response variable, X k is a set of p predictor vari- 
ables (k = 1, 2, p), fi 0 (u { , v;), /?, (u h v;), (u h v;) are the 

regression coefficients for the Mi predictor variable and ith loca¬ 
tion. Si is the random error term whose distribution is N(0, g 2 I), 
with I denoting an identity matrix. The aim of GWR is to obtain 
estimates of these functions for each predictor variable and each 
geographic location i. The estimation procedure of GWR is as 
follows: (i) draw a circle of a given radius around one particular 
location i (the center), (ii) compute a weight (wy) for each 
neighboring observation j according to the distance ^ between 
the location j and center /, and (iii) estimate the model coeffi¬ 
cients using weighted least-square regression such that 


Y = XP + £ 



P : =(X'W,xyx'W,Y 


where (3 is a vector of unknown fixed-effects model coefficients 
to be estimated from the data and s is the model error term which 
follows N(0, a 2 ). The OLS estimator can be obtained by 

P = {X'XY XT (2) 

The relationship represented by eq [1] is assumed to be universal 
or constant across the geographic area. 


where the weight matrix Wj is 





o A 
o 


0 0 • • • w J 

in J 




(2) Linear Mixed Model (LMM). LMM is expanded from OLS 
and can be expressed as 

Y = X/3 + Zy + s (3) 

where 7 is a vector of response variable, X is a matrix of 
fixed-effects predictors, ft is a vector of unknown fixed-effects 
model coefficients, Z is a known design matrix of random-effects, 
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The weighting function is defined by the kernel type and the size 
of kernel (bandwidth), which determines the geographical weight 
of the yth neighboring observation at the ith regression point. The 
weight should decrease gradually as the distance between i and j 
increases, until to a constant or zero. The estimation of model 
coefficients is highly related to kernel size, so the choice of ker¬ 
nel is an important consideration. If Wj = I (i.e., identity matrix), 
each observation in the data has a weight of unity and the GWR 



Journal of Forestry Research (2014) 25(2): 269-280 


273 


model is equivalent to the ordinary least squares model (Fother- 
ingham et al. 2002; Guo et al. 2008). 

Two types of kernel functions are commonly used: (1) Gaus¬ 
sian kernel with fixed bandwidth, and (2) adaptive methods with 
bi-square kernel. There are different approaches to determining 
the bandwidths of the spatial kernel functions. (1) Pre-selected 
bandwidth, (2) use the range of the variogram for the ordinary 
least-squares (OLS) model residuals, (3) minimizing Akaike’s 
Information Criterion (AIC), (4) minimizing cross-validation 
function (CV), (5) multiple bandwidths in an adaptive kernel 
function. In general, there is no significant difference between 
the bandwidths obtained from the cross-validation and AIC 
methods (Guo et al. 2008). 

Therefore, we obtain a set of estimates of spatially varying pa¬ 
rameters without specifying a function form for the spatial varia¬ 
tion. Essentially, GWR lets the data speak for themselves when 
estimating each regression coefficient (3 ik for each geographic 
location i and each independent variable k (Fotheringham et al. 
2002; Guo et al. 2008; Zhang and Gove 2005). 

Localized sample mean 

Given a geographic location with the coordinates (u i? v^ and a 
bandwidth, the spatial weight matrix W* (Eq. (6)) can be calcu¬ 
lated using a selected kernel function. Then, a descriptive statis¬ 
tic like the sample mean of any variable can be computed at each 
location i using the neighboring information as follows: 

n 

X i (u,.,v,.) = ^ 7i - (7) 

7=1 

The process moves from location to location across the study 
area until this localized sample mean is computed for each and 
every location. It describes the spatial heterogeneity of the vari¬ 
able at a local scale and can be visualized on a map to illustrate 
the spatial distribution of the variable (e.g., forest carbon storage) 
(Brunsdon et al. 2002; Ma et al. 2012). 

Model specification 

The following multiple linear model was used to regress forest 
carbon content (C) on the 5 predictors by the OLS, LMM and 
GWR models: 

C = j3 0 + d- DBH + p 2 ■ TPH+ A • Elev+ 

( 8 ) 

p 4 ■ Slope+p 5 ■ Rain _Temp+£ 

where /? 0 -/? 5 were regression coefficients to be estimated from 
data, and s was the model residual, which was defined as the 
difference between the observed and predicted C. In this study, 
the three models were fitted using the model fitting data (n = 
2379 plots). SAS 9.3 (SAS Institute Inc. 2011) was used for 


fitting both OLS and LMM models. The OLS model (Eq. (1)) 
was used as the benchmark for model comparison. For the LMM 
model (Eq. (3)), the six regions were treated as the fixed effects 
because the study was concerned with Heilongjiang Province 
only, whereas the spatial autocorrelations among the sample 
plots within the regions were modeled by the covariance matrix 
R (Little et al. 2006). The GWR model (Eq. (4)) was fitted by 
GWR 4.0 software available at the following link 
http://www.st-andrews.ac.uk/geoinformatics/, using an adaptive 
bisquare kernel function with the golden section search (auto¬ 
mated). The best bandwidth size was 25 km (Fotheringham et al. 
2002). The localized sample mean of the forest carbon was 
computed by GWR 3.0, using the same kernel function and 
bandwidth (25 km) as the GWR model above (Ma et al. 2012). 
GWR 4.0 added new prediction capability compared to GWR 3.0, 
but it eliminated functions that compute local statistics. So we 
used both GWR 3.0 and GWR 4.0. 

Model assessment 

The overall model fit was evaluated by three statistics including 
the coefficient of determination (R 2 ), mean squared error (MSE), 
and Akaike’s Information Criterion (AIC). However, both model 
R 2 and MSE may be biased because they are based on the as¬ 
sumption of independence of observations, which may be vio¬ 
lated by spatial data as analyzed in this study. They are at best 
approximate measures of the model fit. Hence, AIC is more ap¬ 
propriate because the likelihood function does not rely on the 
assumption of independent error terms (Littell et al. 2006). 

It is well known that spatial effects include spatial autocorrela¬ 
tion and heterogeneity. Ignoring spatial effects in a modeling 
process causes misleading significance tests and sub-optimal 
model prediction (Zhang et al. 2009). To investigate the spatial 
autocorrelations in the model residuals, both global and local 
spatial autocorrelation indices (i.e., Moran’s I) were computed 
for the residuals of OLS, LMM and GWR models. The global 
Moran’s / is defined by 

n n 

F0( V - x)(Xj - X ) 

' = - h - ( 9 ) 

i =1 7=1 i 

where x, and Xj represent the observed values of the species rich¬ 
ness changes at atlas blocks I and j, x is the average value of xi 
across the study area, and w^{d) is the spatial weight measure 
within a given bandwidth d. If distance between block j and sub¬ 
ject block i is smaller or equal to d, the w^{d)=\\ otherwise w^{d) 
=0. The expectation of global Moran’s / is The Moran’s 

/value will be larger than if positive spatial autocorrela¬ 

tion exists which indicates that similar values, either high or low 
values, are spatially clustered within a certain distance d. Values 
of Moran’s / below indicate negative spatial autocorrela¬ 

tion and the observed values tend to be dissimilar within distance 
d. Moran’s / approximates 0 when the observed values are ar- 
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ranged randomly and independently over space. 

We used a local indicator of spatial association (LISA) to al¬ 
low for the decomposition of global Moran’s I into localized 
values. The local form of Moran’s 7 Z is defined by: 

i, = (x, - w u (4)(y - *) ( io > 

j 

We used local Moran’s / 7 to detect the local clustering around 
an individual atlas block location and to test for spatial nonsta- 
tionarity. A positive local Moran’s /, implies a spatial clustering 
of similar values (either high or low) around an individual loca¬ 
tion, whereas a negative local Moran’s I t implies a clustering of 
dissimilar values around an individual location. 

In this paper, the global Moran’s / was computed for a range 
of bandwidths from 5 to 35 km at 5 km intervals, which de¬ 
scribed the overall spatial autocorrelation in the model residuals 
at different spatial scales. The local Moran’s /, one of the local 
spatial autocorrelation indices (LISAs), was computed at the 
optimal bandwidth (25 km) for each residual of OLS, LMM and 
GWR models. The spatial distribution of local Moran’s / was 
used to show the "hot spots” of residual clusters (i.e., the same 
sign of model residuals) and "cold spots" of residual clusters (i.e., 
the opposite sign of model residuals) across the study area. These 
“hot spots” indicated undesirable characteristics of the models 
(Zhang and Gove 2005, Zhang et al. 2005, 2009). 

To quantify the spatial heterogeneity of model residuals, 
intra-block spatial variances were computed for the residuals of 
OLS, LMM and GWR models at block sizes ranging from 5 to 
25 km at 5 km interval. The intra-block spatial variances were 
used to illustrate the local spatial variability in the model 
residuals for each model (Zhang et al. 2009). 

Model prediction and validation 

The data used in this study were split into model fitting and 
model testing data sets. A total of 659 plots were randomly se¬ 
lected for testing the ability of model prediction for the three 
models. The following statistics were used to evaluate model 
predictions: 


Mean squared error: 


m 


Z(C:-C, 

MSE = —- 


m — p 


(ii) 


Root mean squared error: 



RMSE = 



m 


( 12 ) 


where Q is the observed forest carbon of a testing plot, C i is 

the predicted forest carbon for the testing plot by each of the 
three models, p (=5) is the number of parameters, and m (= 659) 
is the size of the testing data set. 
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Results 

Spatial distribution of localized mean of forest carbon storage 

The local descriptive statistic (i.e., localized mean) of forest car¬ 
bon storage was computed using the optimal bandwidth 25 km 
across the study area (Fig. 2). Localized mean was a very effec¬ 
tive method to test the spatial patterns of change in forest carbon 
storage with the change of distribution of observations. It 
showed more detailed information and reflected the geographic 
distribution trend of forest carbon storage. Fig. 2 illustrates that 
the forest carbon storage in our study area was not evenly dis¬ 
tributed, but varied greatly from place to place. The northeast 
and south regions were mountainous with higher elevations, 
steeper slopes and denser forest stands so that forest carbon 
storage had much higher mean values (about 40 t/ha). In contrast, 
the west regions were located in the Central Plain at lower eleva¬ 
tions with flatter slopes and mostly included urban residential 
areas with low tree densities. This region had low overall forest 
carbon storage (<25 t/ha). It was evident that forest carbon stor¬ 
age differed significantly over the study area, illustrating hetero¬ 
geneous spatial distribution of forest carbon storage. 



Fig. 2 : Localized mean of the forest carbon storage. The dots are the 

sample plots 
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Model fitting and coefficients 

The response variable (forest carbon content) was regressed on 
the five predictors, average diameter (DBH), stand density (TPH), 
elevation (Elev), slope (Slope) and the product of precipitation 
and temperature (RainTemp). The overall model fit statistics are 
listed in Table 2. It was clear that the local model, GWR, fit the 
data much better with lower AIC, MSE and R 2 than the two 
global models (OLS and LMM). The LMM model also fit the 
data better than the OLS model because the LMM model incor¬ 
porated spatial heterogeneity via random effects across the six 
regions. 


Table 2: Model fitting and testing statistics of the three models (OLS, 
LMM and GWR). 


Models 

Model Fitting 

Model Residuals 

Model Testing 

AIC 

MSE 

R 2 

Moran’s I (Z) 

MSE 

Root-MSE 

OLS 

18091.8 

101.10 

0.74 

0.071 (23.09) 

100.378 

10.0189 

LMM 

17967.6 

95.01 

0.78 

0.019 (6.24) 

93.6024 

9.67483 

GWR 

17594.8 

68.94 

0.84 

-0.006 (-1.80) 

80.2304 

8.95714 


The estimates of model coefficients, standard errors (SE), and 
p-values of the five predictors are listed in Table 3 for the two 
global models (OLS and LMM). Each of the model coefficients 
was statistically significant at a = 0.05 for both models, except 

the estimate for jB 5 in the LMM model, which yielded a p 

value of 0.5362. j3 5 was the product of precipitation and tem¬ 
perature (Rain Temp). This showed that these climate features 
were similar across the six regions. It seemed that the standard 
errors of the slope coefficients of the OLS model were smaller 
than those of the LMM model, indicating that OLS might have 
under-estimated standard error due to violation of the assumption 
of independence of observations (spatially correlated forest car¬ 
bon content) (Table 3). 


Table 3: Model coefficient estimates, standard error (SE), and p-values 
of the OLS and LMM models. 


Statistics 

A 

A 

A 

K 

a 

A 

OLS model 







Estimate 

-54.3748 

5.1879 

0.0202 

0.0076 

0.1301 

-0.0010 

SE 

1.1623 

0.0628 

0.0003 

0.0019 

0.0342 

0.0003 

p-value 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

0.0001 

0.0007 

J3-VS.D. 

-55.5371 

5.1251 

0.0199 

0.0057 

0.0959 

-0.0013 

J3+ 1 S.D. 

-53.2125 

5.2507 

0.0205 

0.0095 

0.1643 

-0.0007 

LMM model 







Estimate 

-54.7367 

5.0923 

0.0197 

0.0060 

0.1050 

0.0005 

SE 

1.2860 

0.0637 

0.0007 

0.0021 

0.0362 

0.0008 

p-value 

<.0001 

<.0001 

<.0001 

0.0041 

0.0038 

0.5362 

J3-VS.D. 

-56.0227 

5.0286 

0.0190 

0.0039 

0.0688 

-0.0003 

J3+ 1 S.D. 

-53.4507 

5.1560 

0.0204 

0.0081 

0.1412 

0.0013 


The GWR model produced model coefficients that varied geo¬ 
graphically, i.e., one set of model coefficients was produced for 


each geographic location. Table 4 summarizes the descriptive 
statistics of these local model coefficients, including mean, 
minimum, 25% quartile (Qi), median, 75% quartile (Q 3 ), and 
maximum. The medians of the GWR coefficients were similar to 
those produced by the OLS and LMM models. However, the 
localized GWR coefficients had wide ranges across the study 
area. This can be evaluated by the facts that 50% of the GWR 
coefficients fell between Ch (25% quartile) and Q 3 (75% quar¬ 
tile), while about 68% of the OLS or LMM coefficients, assum¬ 
ing normal distribution, should be within ±1 standard deviation. 
If the inter-quartile range of the GWR local coefficients is 
greater than that of ±1 standard deviation of the equivalent global 
parameter, the relationship under study might be spatially 

non-stationary (Fotheringham et al. 2003; Zhang et al. 2004). 

/\ 

For example, the inter-quartile range of the GWR local f5 x 

(DBH) was 4.9116-5.5745, which was outside the range of ±1 
standard deviation of the OLS model (5.1251-5.2507) and of the 
LMM model (5.0286-5.1560). Similarly, the inter-quartile range 

of the GWR local (3 2 (TPH) was 0.0177 - 0.0239, which was 

outside the range of ±1 standard deviation of the OLS model 
(0.0199 -0.0205) and of the LMM model (0.0190 - 0.0204) 
(Tables 3 and 4). Therefore, it was evident that the relationship 
between forest carbon content and five predictors was indeed 
spatially non-stationary or heterogeneous (Fotheringham et al. 
2003; Zhang et al. 2004). 


Table 4: Model coefficient estimates of the GWR model. 


Statistics 

A 

A 

A 

K 

A 

A 

Mean 

-62.0835 

5.1563 

0.0210 

0.0250 

-0.3593 

-0.0018 

Minimum 

-205.3903 

1.8654 

0.0060 

-0.1041 

-9.5495 

-0.1144 

Qi 

-70.1444 

4.9116 

0.0177 

0.0028 

-0.0461 

-0.0110 

Median 

-56.8204 

5.3015 

0.0207 

0.0119 

0.0240 

-0.0029 

q 3 

-47.9915 

5.5745 

0.0239 

0.0330 

0.1973 

0.0071 

Maximum 

55.0408 

7.9866 

0.0544 

0.5131 

2.3927 

0.0864 


The spatial variation of each model coefficient of the GWR 
model is illustrated in Fig. 3. The relationship between forest 
carbon and the 5 predictors varied from region to region across 

/v 

the study area. In general, the model coefficients of DBH ( f3 x ) 

/v 

and TPH (/? 2 ) were positive over the study area, while the 

/v p. 

model coefficients of Elev ( j8 3 ), Slope (// 4 ), and Rain Temp 

/v 

(A ) ranged from negative to positive depending on the region 

/v 

(Table 4). For example, in the west regions, p [ (DBH) was 

1.87-3.25, )3 2 (TPH) was 0.006-0.016, (ELev) was 

-0.10—0.01, /? 4 (Slope) was -9.55—7.69, and /3 5 

(Rain Temp) was -0.11 —0.07. In the southeastern regions, 

/?, (DBH) was 5.95-7.99, fi 2 (TPH) was 0.02-0.03, j3 3 

A 

(ELev) was 0.06-0.16, /? 4 (Slope) was -0.32-2.39, and (3 5 
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(RainTemp) was 0.05-0.09. Fig. 3 clearly demonstrates that the 
signs and magnitudes of these regression coefficients varied by 
geographic region, indicating the different impacts of the five 


predictors on the forest carbon content, depending on the forest 
conditions and topographic features. 


(a) 


(b) 



Legend 

Intercept 
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Fig. 3 : Contour maps of the model coefficient estimates from the GWR model at the band width 25 km: (a) Intercept A , (b) p x for DBH, (c) /? 2 

^ /V 

for TPH, (d) for Elev, (e) jj 4 for Slope, and (f) j3^ for Rain Temp. 


Spatial autocorrelation in model residuals 

Global Moran’s I and Z-values were computed for the model 
residuals of the three models based on a bandwidth of 25 km for 
GWR (Table 2). Z-values are simply standard deviations. Ac- 

^ Springer 


cording to the Z-value, the model residuals of both OLS and 
LMM had significant spatial autocorrelation (Z-value >2.58, 
significant at a = 0.01), while the spatial autocorrelation of the 
GWR residuals was not significant (Z-value <1.96, at a = 0.05). 
Thus, the local modeling technique, GWR, successfully reduced 
the spatial autocorrelation in model residuals (Table 2). 
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Fig. 4 illustrates the spatial correlogram of the model residuals 
for the three models across a range of lag distances (5-35 km). 
The residuals of the OLS model presented much larger spatial 
autocorrelations than did those of LMM and GWR at each lag 
distance tested. The LMM model significantly reduced spatial 
autocorrelation compared with OLS, but it was relatively large, 
especially at smaller spatial scales (5-15 km). On the other hand, 
the spatial autocorrelation of the GWR model residuals always 
approached zero at different spatial scales (Fig. 4). 



Fig. 4: Correlogram of the model residuals of the three models. 

Local Moran’s / was computed for residual from each of the 
three models. However, because the entire study area was too 
large and the number of plots too many to show for all plots, we 
chose to show local Moran’s / for the plantation plots only as a 
demonstration in Fig. 5. It was clear the values of local Moran’s / 
for the OLS model residuals were larger and more positive 
(black dots in the figure) across the area, indicating clusters (hot 
spots) of either under- or over-prediction by the OLS model re¬ 
siduals. The values of local Moran’s / for the GWR model re¬ 
siduals were much smaller and more negative (circles in the fig¬ 
ure), indicating the clusters (cold spots) of opposite signs of the 
model residuals, which was a more desirable characteristic of the 
model because neighboring residuals had opposite signs and 
might have resulted in a zero average for any local area (Zhang 
and Gove 2005; Zhang et al. 2005, 2009). 

Spatial heterogeneity in model residuals 

Fig. 6 illustrates the intra-block variances in model residuals for 
the three models, which represented the local spatial variability 
or heterogeneity for a given block size. The two global models 
(OLS and LMM) had similar sizes and patterns of the intra-block 
variances across different block sizes, indicating that although 
the LMM model reduced the spatial autocorrelation in the model 
residuals, it did not reduce spatial heterogeneity in the model 
residuals. On the other hand, the intra-block variances of the 
GWR model residuals were much smaller than those of the two 
global models at different spatial scales. Our results are consis¬ 
tent with those of other studies (e.g., Zhang et al. 2005, 2009). 



X(km) 


LMM 



X(km) 



X(ktn) 

Fig. 5: Spatial distribution of local Moran’s I of model residuals for the 
plantation plot: OLS, LMM and GWR. The size of the symbols (black 
dot and circle) is proportional to the values of Moran’s I. The black dots 
represent positive values (hot spots), and the circles represent negative 
values (cold spots). 



Fig. 6 : Intra-block variances of the model residuals of the three models. 
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Model prediction 

Table 2 shows the MSE and RMSE of model predictions for the 
model testing data for the three models. In terms of model 
RMSE, the GWR model reduced the model prediction error 
about 10.6% from the OLS model, and 7.2 % from the LMM 
model, while the LMM model reduced the prediction error 3.5% 
from the OLS model. 

Discussion 

In this study, we used both global and local regression models to 
investigate the relationship between forest carbon storage and 
stand, environment and weather factors or predictors. This in¬ 
formation can be used to predict forest carbon storage to provide 
useful information on forest management planning and decision 
making. One of the main purposes in this study was to learn the 
spatial variation of forest carbon storage across Heilongjiang 
province. 

Our five predictors had statistically significant effects on the 
amount and distribution of forest carbon storage. In both global 
models (OLS and LMM), DBH and TPH were the most impor¬ 
tant stand variables for our study area. The model coefficients 

/v /v 

DBH ( /3 X ) and TPH (/? 2 ) were both positive (Table 3), indi¬ 
cating that larger DBH and higher stand density would sequester 
more carbon in the forests. Our results are consistent with those 
of other studies. For example, Brown (2002) found that DBH 
alone explained more than 95% of the variation in aboveground 
carbon content in tropic forests. Among topographic factors, 
elevation and slope were most important. However, the model 

/v p. 

coefficients for Elev ( j3 3 ) and Slope () were positive (Table 

3), indicating higher elevations and steeper slopes would support 
more stored carbon, while other studies found negative coeffi¬ 
cients for elevation (Lai 2005). This can be explained in our 
study area by the fact that the stands at lower elevations on flatter 
slopes typically had lower stand densities due to human activities 
and thus had lower forest carbon content. For weather factors, 
the interaction of rainfall and temperature was more important 
than each factor alone in this study. Rainfall is always a strong 
factor influencing the growth of trees, and, consequently, carbon 
sequestration. Temperature directly affected rainfall, soil water 
content, tree growth and carbon storage. 

The discussion above was based on the model coefficients of 
the global models (OLS and LMM), which were constants across 
the study area. Given the diverse forest conditions and topog¬ 
raphic features of the stands, these constant model coefficients 
would not be able to accurately describe the impacts of the five 
predictors on forest carbon for different local areas. On the other 
hand, the GWR model provided localized model coefficients 
which can be visualized using GIS technology (Johnston et al. 
2001) to show more detailed information on the relationship 
between the forest carbon storage and predictors. Fig. 3 shows 
the spatial variations of the model coefficients. It is clear that the 
impacts of these predictors on forest carbon vary from region to 

4^ Springer 


region in the study area. In general, the model coefficients of 

/v /v 

DBH ( /3 X ) and TPH ( j3 2 ) were still positive over the study area, 
but the magnitudes of the coefficients varied substantially by 

region. The model coefficients Elev ( jS 3 ), Slope ( p A ), and 

/v 

RainTemp ( j3 5 ) ranged from negative to positive depending on 

the region. Thus, the constant model coefficients of the global 
models (OLS and LMM) not only provided biased prediction for 
a specific area, but also misrepresented the relationship between 
forest carbon stock and stand and environment factors. The lo¬ 
calized model coefficients of the GWR model provided detailed 
information on the influence of micro-site variation and the im¬ 
pacts of management activities on tree growth and forest carbon 
storage in aid to both management planning and decision making 
(Zhang and Shi 2004; Lu and Zhang 2012; Zhen et al. 2013). 

Analyses of spatial autocorrelation in the model residuals in¬ 
dicated that the violation of the OLS model assumption of inde¬ 
pendence of observations was clear and significant, and this led 
to biased hypothesis testing on the model coefficients. Although 
the LMM model was designed to deal with spatial autocorrela¬ 
tion, our results showed the LMM model residuals retained sig¬ 
nificant spatial autocorrelation (Table 2), especially at smaller 
spatial scales (Fig. 4), unlike in other studies (Zhang et al. 2005; 
2009). On the other hand, the GWR model successfully reduced 
levels of spatial autocorrelation in the model residuals while 
yielding desirable spatial patterns for model residuals (Fig. 6). 
However, GWR did not directly incorporate spatial autocorrela¬ 
tion into the modeling process because it assumed N(0, g 2 I) for 
the model error term, but GWR explicitly took spatial locations 
into account and emphasized local variation in the relationships 
between variables. Our results were consistent with findings in 
the literature (e.g., Zhang et al. 2005; 2009). 

The intra-block variances (local spatial variability) showed 
that the GWR model residuals had much smaller spatial hetero¬ 
geneity than those of the two global models across different 
block sizes. And the difference between LMM and OLS was not 
significant. This was also a bit surprising because the literature 
showed that LMM greatly reduces spatial heterogeneity in model 
residuals for tree growth data (e.g., Zhang et al. 2005; 2009). The 
reason may be that the spatial variation among the sample plots 
for forest carbon storage was much larger than the spatial varia¬ 
tion between stands of trees so that the LMM model in this study 
reduced to some degree both spatial autocorrelation and hetero¬ 
geneity. 

The GWR methodology also has weaknesses and limitations 
such as the influence of outliers (local parameter estimates can 
be strongly affected by a single outlier or outliers), weak data 
problem (a small number of neighbors available within a given 
bandwidth, especially if a “pre-selected” bandwidth is small), 
and lack of independence among the model parameter estimates 
(Zhang et al. 2004; Zhang and Shi 2004; Shi et al. 2006). Never¬ 
theless, given the variety and complexity of forest ecosystems, 
this local modeling technique does provide high-precision and 
visualizable information for planning silvicultural and manage¬ 
ment activities and reduces the cost and labor to obtain more 
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carbon. 

Conclusions 

In this study we investigated the spatial distribution of forest 
carbon storage in Heilongjiang Province, P.R. China, and the 
relationship between forest carbon stocks and five stand and 
environmental parameters, viz. tree diameter (DBH), number of 
trees per hectare (TPH), elevation (Elev), slope (Slope) and the 
product of precipitation and temperature (Rain Temp). We fitted 
two global models (OLS and LMM) and a local model (GWR). 
All predictors were statistically significant, indicating they were 
important factors influencing forest carbon storage. The two 
global models provided one predictive model for the entire study 
area and it might have inaccurately estimated forest carbon con¬ 
tent for local areas because the stand conditions and topographi¬ 
cal features varied greatly over the province. 

The GWR model significantly outperformed the two global 
models in model fitting and model prediction because it suc¬ 
cessfully reduced both spatial autocorrelation and heterogeneity 
in model residuals. Therefore, the model coefficients were 
available for each location and could be mapped to illustrate 
spatial heterogeneity in the regression relationship under study at 
different spatial scales. The GWR model could estimate forest 
carbon storage for sites located anywhere in Heilongjiang prov¬ 
ince, even where no plots were sampled. Consequently, the in¬ 
fluence of stand conditions and micro-environment features, as 
well as the impacts of silvicultural management on forest bio¬ 
mass and carbon can be evaluated, tested, modeled, and readily 
visualized. The localized and detailed information would be 
valuable for planning management regimes and decision making. 
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